Structural Parsing of Natural Language Text in Tamil Language Using Dependency Language Model

نویسندگان

  • M. SELVAM
  • A. M. NATARAJAN
  • R. THANGARAJAN
  • M. Selvam
  • A. M. Natarajan
  • R. Thangarajan
چکیده

Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of a natural language sentences confined to the grammar. Parser is a computational system which processes input sentence according to the productions of the grammar, and builds one or more constituent structures which conform to the grammar. The interpretation of natural language text depends on the context also. Statistical methods mainly focus on semantics where as structural methods focus on syntax. Though statistical parsing gives the better performance through N-gram probabilities and large vocabulary size, it has some disadvantages like lack of support in free ordering of words and long term relationship. Structural parsing provides solutions to some extent but it is very tedious for larger vocabulary corpus. To accommodate syntax and semantics, structural component is to be involved in statistical approach. To add the structural component and balance the vocabulary size, Lexicalized and Statistical Parsing (LSP) is to be employed with phrase or dependency structure language model. To maintain the long term relationship in complex and large sentences, phrase structure language model is not suitable. Also large training set is needed to leverage the performance. When dependency relations are applied among words, direct relationships can be established. This paper focuses on dependency based lexicalized and statistical parsing of natural language text in Tamil language. LSP needs dependency based hybrid language model which supports better degree of long term relationship and free ordering of words than phrase structure. New Part of Speech (POS) tag set for Tamil language has been proposed with more than 500 tags which has the wider coverage. Also dependency tag set with 31 tags is used for obtaining dependency relations. Medium size vocabulary dependency Treebank has been developed with 326 sentences which comprises more than 5000 words with manual annotation. Dependency language model has been trained using dependency Treebank. This structural parser gives better results and covers above said features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexicalized and Statistical Parsing of Natural Language Text in Tamil using Hybrid Language Models

Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of a natural language (NL) sentences confined to the grammar. Parser is a computational system which processes input sentence according to the productions of the grammar, and builds one or more constituent structures which conform to the grammar...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syn...

متن کامل

Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model

Parsing is important in Linguistics and Natural Language Processing to understand the syntax and semantics of a natural language grammar. Parsing natural language text is challenging because of the problems like ambiguity and inefficiency. Also the interpretation of natural language text depends on context based techniques. A probabilistic component is essential to resolve ambiguity in both syn...

متن کامل

An efficiency dependency parser using hybrid approach for tamil language

Natural language processing is a prompt research area across the country. Parsing is one of the very crucial tool in language analysis system which aims to forecast the structural relationship among the words in a given sentence. Many researchers have already developed so many language tools but the accuracy is not meet out the human expectation level, thus the research is still exists. Machine...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008